.i.^c_.A«e.<««. 


STATE  OF  THE  ART 


^3     t^J^p 


Wide-area  information  servers  open  a  new  frontier  in  personal  and  corporate  information  services 

RICHARD  MARLON  STEIN 


he  Library  of  Congress  archives 
roughly  25  terabytes  in  its  collec- 
tion. To  browse  through  this  vol- 
ume on  your  own  would  be  nearly 
impossible.  Wide-area  information  serv- 
ers supply  the  means  to  achieve  this  goal 
by  providing  the  user-interface  structure 
and  underlying  information-retrieval 
protocol  necessary  to  automatically  col- 
late, collect,  and  integrate  diverse  data 
streams.  WAISes  can  distill  the  contents 
of  vast  archives  into  neatly  manageable 
and  browsable  folders. 

On-line  information  services,  such  as 
BIX  and  CompuServe,  attest  to  the  need 
for  this  kind  of  technology.  Information 
has  acquired  a  commodity-like  status. 
While  not  on  a  par  with  wheat,  pork  bel- 
lies, or  gold  futures,  the  information-ser- 
vice industry  fills  a  vital  role.  The  next 
phase  of  information  commerce  will  add 
WAIS  capabilities  to  existing  on-line  ser- 
vices, opening  a  new  frontier  in  personal 
and  corporate  information  services. 


■■■■  .         .  ^      .  ','       r  '  ill  "        n.  '    '  •,    r  .'      r"*,/ 

■■'.>Q' ■■■  ■:■■■  -:,        ..  ■  ■  ■•--" '>-'-^ ■  /*-&i|»Sl  fe, 
--;  ■■.-■'A      .-:-./  ;..-^A  ^^y  /''v:.  VAkv.-   \    ■-=•  \  ■■      > 


Intentions  and  Go&ls 

Initiated  in  early  1989,  the  WAIS  engi- 
neering effort  is  spearheaded  by  Think- 
ing Machines  (Cambridge,  MA),  the 
manufacturer  of  the  Connection  Ma- 
chine, a  massively  parallel  supercom- 
puter (see  reference  1).  The  principal 
goal  of  the  research  project  is  to  demon- 
strate "how  current  technology  can  be 
used  to  open  a  market  of  information  ser- 
vices that  will  allow  a  user's  workstation 
to  act  as  librarian  and  information  col- 
lection agent  from  a  large  number  of 
sources."  (See  reference  2.)  WAISes  aim 
to  enhance  existing  information  services 
and  provide  a  utilitarian  mechanism  for 
the  industry. 

continued 

ILLUSTRATION:  BON  CHAN®  1991 

*~  Cirel®  352  on  Inquiry  Card. 


r 


_,■,(),■■'■,,  .>.■..■■'  *;;■.,. V,,. 


MAY  1991  •BYTE     157 


'-^,[ 


BROWSING  THROUGH  TERABYTES 


-4 


m 


'-'M 


'%t. 

it;' 


0..'-,.: 

:^-  'J: 


I.I    J 


Information  servers  already  provide 
direct  access  to  many  databases  and  ar- 
chive structures.  You  can  easily  check 
the  local  weather,  make  travel  reserva- 
tions, obtain  entertainment  schedules,  or 
browse  through  the  latest  stock-market 
quotes  on-line.  These  services  are  highly 
interactive,  charging  users  on  the  basis  of 
minutes  spent  on-line,  and  each  has  a 
unique  user  interface. 

WAISes  alleviate  unnecessary  user  in- 
teraction through  a  predominantly  com- 
puter-to-computer approach  to  remote 
information  retrieval.  By  minimizing 
human  interaction  with  a  remote  infor- 
mation server,  they  handle  requests  for 
information  expeditiously  and  inexpen- 
sively. WAISes  also  alleviate  unneces- 
sary complexity  by  moving  all  user  inter- 
action to  the  local  workstation  and  by 
having  WAIS  software  handle  all  trans- 
actions with  the  remote  server. 

On-line  servers  are  limited  in  their 
connectivity.  While  many  services,  such 
as  BIX,  CompuServe,  and  AppleLink, 
incorporate  wide-area  network  struc- 
tures, sharing  information  between  dif- 
ferent services  is  not  a  wholly  transpar- 
ent option.  This  restriction  constrains 
information  commerce  and  hampers  the 
circulation  of  potentially  useful  ideas. 

WAISes  circumvent  this  barrier  with  a 
standard  information-exchange  protocol 


BVIE 


liiTiiNBUMMiii 


The  next  phase  of  informa- 
tion commerce  will  add  wide- 
area  information  server  capa- 
bilities  to  existing  on-line 
services.  WAISes  provide  the 
user-interface  structure  and 
the  underlying  information- 
retrieval  protocol  necessary 
to  automatically  collate,  col- 
lect, and  integrate  informa- 
tion from  various  sources. 
When  these  are  implement- 
ed, you  should  be  able  to  di- 
rectly access  such  sources 
as  the  Library  of  Congress 
and  the  myriad  of  newspa- 
pers, journals,  and  books. 


IS8     BYTE -MAY  1991 


that  offers  unlimited  connectivity  and  re- 
trieval functionality.  All  servers  can  ap- 
ply the  WAIS  protocol  to  their  archive 
structures  to  conduct  information  re- 
trieval. (Unlimited  connectivity  also 
raises  concerns  of  security  and  privacy. 
See  the  text  box  "The  Right  to  Privacy" 
on  page  160.) 

Organized  and  coherent  information 
of  topical  importance  has  value.  Individ- 
uals and  companies  should  be  able  to 
market  their  information  to  the  widest 
possible  audience.  Current  on-line  ser- 
vices can't  easily  accomplish  this,  since 
their  connectivity  is  restricted. 

To  direct  your  information  to  the  best 
marketplace,  you  could  subscribe  to  mul- 
tiple on-line  sources  and  post  the  same 
message  on  all  of  them.  But  it  would  be 
more  efficient  to  post  the  data  on  one 
server  and  have  the  data,  or  an  abstract 
of  it,  broadcast  to  the  others.  Using  the 
WAIS  protocol,  WAISes  facilitate  this 
server  function. 

Suppose,  for  example,  you  have  re- 
viewed the  latest  set  of  RISC  micropro- 
cessor benchmarks,  taking  note  of  spe- 
cific architectural  advantages,  and  you 
wish  to  make  this  information  available 
to  others.  The  benchmark  review  is  kept 
on  your  home  computer  (i.e.,  the  local 
WAIS),  which  is  equipped  with  WAIS 
technology.  The  nearest  remote  WAIS,  a 
hub  within  a  network  of  servers,  also  has 
a  folder  for  RISC  microprocessors.  So 
you  make  a  posting  to  the  nearest  hub 
server  that  inserts  a  pointer  to  the  review 
on  your  home  computer. 

Everyone  with  a  computer  running  the 
WAIS  user-interface  software  can  pre- 
sent information  to  a  server  and  receive 
compensation  for  whatever  portion  of  it 
other  WAIS  subscribers  access.  The 
compensation  can  be  monetary,  or  you 
can  barter  your  information  for  someone 
else's. 

Even  publishers  of  books,  magazines, 
newspapers,  and  music  can  participate 
and  profit  from  WAISes.  For  example, 
how  much  money  could  a  newspaper  save 
in  circulation  costs  if  ycMi  received  the 
morning  paper  electronically  instead  of 
printed  on  paper?  Similarly,  how  much 
money  could  a  book  publisher  save  if  you 
purchased  a  new  best-selling  novel  elec- 
tronically instead  of  at  a  bookstore? 

Traditional  information  delivery  is  ex- 
pensive, and  costs  are  rising.  The  U.S. 
Postal  Service  frequently  raises  its  fees 
to  cover  increases  in  the  cost  of  handling 
and  transporting  information.  Tradition- 
al information  transport  also  represents  a 
significant  fraction  of  transport  volume 
and  collateral  energy  consumption. 
Moving  information  electronically  can 


result  in  enormous  savings. 

Computer  networks  such  as  Internet 
are  conduits  of  information  transport.  To 
replace  manual  transportation  methods, 
the  existing  electronic  infrastructure 
must  accommodate  the  newly  anticipated 
volume  of  traffic.  Plans  for  "a  national 
network  of  data  superhighways,"  which 
will  be  installed  within  the  next  few 
years,  are  under  way  (see  references  3 
and  4). 

A  principal  motivation  for  WAIS  tech- 
nology is  to  be  able  to  retrieve  topical  in- 
formation for  research  or  investigation, 
not  just  to  deliver  consumable  items  like 
newspapers  or  books.  Toward  this  end, 
WAISes  rely  on  a  novel  structure  for  in- 
formation retrieval,  the  dynamic  folder. 

To  use  a  WAIS,  you  formulate  a  ques- 
tion (see  figure  1),  find  the  information 
servers  that  provide  satisfactory  re- 
sponses, and  create  a  dynamic  folder. 
The  purpose  of  the  dynamic  folder  is  to 
constantly  or  periodically  update  its  con- 
tents with  new  material  on  the  subject. 

Formulating  a  question  is  natural  to  us 
all.  The  difficult  part  is  locating  the  per- 
tinent information  to  answer  it.  Manual- 
ly locating  the  information  can  be  labori- 
ous and  tedious.  WAISes  automate  the 
search-and-retrieval  process.  To  deter- 
mine which  servers  hold  the  information 
most  pertinent  to  your  question,  and 
where  you  should  submit  dynamic  fold- 
ers, you  may  want  to  consult  server  di- 
rectories. 

Server  Directories 

WAIS  directories  are  servers  that  sup- 
port a  directory-services  function.  They 
are  indexes  to  other  services  within  the 
WAIS  network  and  are  organized  to  help 
you  locate  information.  Like  telephone- 
directory  services,  WAIS  directories  list 
pointers  to  servers,  which  are  grouped 
according  to  content  and  function. 

A  directory-entry  header  contains  suf- 
ficient data  to  describe  the  service,  such 
as  an  English-language  description  of  the 
server,  the  parent  server  (if  the  server  is 
a  subsidiary  of  a  larger  one),  related 
servers,  contact  information  (including 
networks  and  human-interface  points), 
and  cost  information. 

The  local  workstation,  when  equipped 
with  a  WAIS,  should  maintain  a  direc- 
tory entry  that  includes  the  directory- 
entry  header,  a  locally  determined  rank, 
subscription  information  (if  any),  user 
comments,  and  the  time  of  last  contact. 
You  can  use  this  information  to  decide 
wdiether  to  contact  the  server  and  how  to 
handle  the  responses. 

By  using  content  navigation,  you  can 
find  the  most  appropriate  server  to 


BROWSING  THROUGH  TERABYTES 


ernet 
1.  To 
lods, 
;ture 
)ated 
onal 
hich 
few 
es  3 

'ch- 
1  a- 
ion, 
like 
Qd, 
in- 

T, 

les- 
cion 


^  '0 
Oii- 


~i=^=-  Sources 

't'  CM  »pplic4tjons 
f"  Et>egcl<)p«(li» 
I   Kinj  Jjnws  Bibk 


Mackitosh  Hard  Disk 
TMC  BuswKss  pmail 

TMC  Lit>r»ni /~v»  '    - 


t   World  Fictbook 


r«TV4*^.  v.-  — ■/ 


^^^ 


Lool<  fnr  documents  about 


recent  developments  in  personal 
icompjters 


^3 


Vhich  ^rg  similar  to  In  thgse  sourogs 


E  Technology :  Coi^ 

A  — 

. O 


«>     Kj/ZAV.  UWTM/ 


Compaq  CompuUr  Directors  Approv«  2-for-l  Stock  Split 
httrnation»l :  Bull  Agrtts  to  P«g  Z«nith  $1 5  Million  to  Enfe 
AT&T  S(t  to  Announce  Memortx  Computer  Accord 
Technology  Brief—  IMernJtional  Business  Machines :  Pri<jH^ 
Business  Brief—  Data  General  Corp.:  Four  Models  Are  Ur 


Results 


B  «»!>  Compaq  Computer  Directors  Approve  2-for-l  Stock  Spi '[<>] 
B  «•«  International:  Bull  Agrees  to  Pag  Zenith  $15  Million  to  fn'" 
Q  •«»  AT&T  Set  to  Announce  Memorex  Computer  Accord 
B  «»•  Technology  Brief  —  International  Business  Machines:  Prii 

BusinessBrieY— DataOenjT^^ 

jiPi  Technology:  Computer  Firms  See  the  tUritlri'i  i 

InternstioMl  Business  Machines  Corp.,  Apple  Computer  In 
end  other  big  computer  mekers  ere  steking  out  positions  in 
the  nascent  market  for  "note- pad  computers,"  smell  mach 
that  let  users  enter  data  bg  writing  rather  than  tapping 
kegs.  The  note  pads  typically  recognize  numbers  and  letters 
printed  on  a  screen  with  a  special  pen  end  convert  them  into 
conventional  electronic  characters.  The  information  Is  then 
stored  for  later  transfer  to  a  personal  computer  or  a 
company's  malnctmputers. 

The  size  of  the  market  for  note-pad  computers  isn't  cleai 
but  Infoeorp,  a  Santa  Clara,  Calif.,  market- research  firm, 
esti  mates  the  market  will  grow  to  3.4  million  units  sold  in 
1995  from  22,000  units  this  year.  Only  one  company.  Tend, 
Corp.'s  Grid  Systems  unit,  currently  sells  note- pad  com  put  i 
in  the  U.S.;  its  model,  introduced  last  September,  is  priced 
at  $3,000.  But  new  ventures  are  expected  to  Introduce  sever 
note- pad  machines  this  year.  And  already,  big  computer  mak 
are  fighting  quietly  for  control  over  software  standards  for 
these  gadgets,  which  require  different  programs  from  those 


12  «*•  Retailing :  Businessland  Enters  Japan,  Aided  bg  4  Big  Loca 


^rl-JO 


::.'■(. 


r<' 


>.</ 


uterstolookfbrtopkatft^n  A 


MAY  1991  'BYTE     15S 


BROWSING  THROUGH  TERABYTES 


AIStatioD,  a  prototype  user 
interface  developed  by  the 
Thinking  Machines  wide- 
area  information  server  proj- 
ect staff,  embodies  many  functional  as- 
pects of  WAIS  technology.  Forming 
and  refining  queries  via  relevance  feed- 
back, server  selection,  and  dynamic 
folders  are  the  principal  features  that 
this  prototype  supports.  These  assets 
provide  a  powerful  tool  set  for  infor- 
mation retrieval.  While  WAIStation 
achieves  several  desirable  technical 
goals,  the  security  and  privacy  issues 
have  not  yet  received  serious  attention 
and  need  refinement. 

Security  and  privacy  issues  are  not 
specific  to  WAIStation  or  WAISes  in 
general,  but  are  endemic,  tqjical  con- 
cerns of  the  information-retrieval  in- 
dustry as  a  whole,  WAIS  technology 
seeks  to  extend  connectivity  through  the 
WAIS  protocol,  thus  intensifying  the 
urgency  of  security  measures  and  stan- 
dards. Greater  connectivity  promotes 
information  commerce,  but  it  also  adds 
to  the  risk  of  compromising  the  privacy 
and  confidentiality  of  electronic  trans- 

.actions.  ,■    :      :    '^■^:v~.-,,^y:^^:.^...  / 

Individuals  and  corporations  that 
subscribe  to  WAISes  must  safeguard 
proprietary  information.  The  tendency 
to  organize  information  within  a  com- 
puter for  ease  of  access  or  to  act  as  a 
convenient  archive  creates  a  security 


and  privacy  dilemma.  And  if  the  sensi- 
tive data  is  located  on  a  machine  with 
high  connectivity^  the  risk  is  multi-i- 
plied-i^W-^'/w-iiH' ^'■■'■-^ '"■'^■■■::^-'-'.'\\  -Z 

A  WAIStation  that  holds  personal  in-- 
formation,  such  as  tax  forms,  diaries,  - 
business  transactions,  medical  records, . 
or  bank  accounts,  must  be  protected 
from  intrusicMi  by  unauthorized  individ- 
uals. A  computer  system  storing  this 
information  "knows"  more  about  you 
than  ytw  can  instantly  recall.  Access  to 
this  personal  data  must  be  protected, 
controlled,  and  limited  to  authorized 
individuals.:-- ■/,<■:.  :iiv.:.:. ,  :-ii  ;f '■.;;  =v,  ■■.  ^-  ■  >  • 

The  WAIS  protocol  is  an  application- 
layer  protocol  that  runs  over  X.25  com- 
munications, modems,  or  IEEE  802.3 
(Ethernet)  backbones.  Residing  beneath 
this  protocol  is  the  WAIStation  host 
computer  and  operating  system.  Ex- 
tracting information  from  the  server  de- 
pends on  access  granted  through  a  rec- 
ognition and  authentication  system  that 
the  host  computer  operates.  Only  autho- 
rized subscribers  can  access  informa- 
tics from  the  serveK ;  *  > '  t  :  jirn^.  ^ ;  •;- 
>  The  WAIS  prttocd  is  stateless,  so^ 
each  transaction,  whether  a  query  or  . 
document-retrieval  process,  exists  in  a 
separate  context  at  the  server.  Subver- 
siMi  of  the  WAIS  protocol,  whether  in- 
tentional or  accidental,  might  unlock  or 
bypass  a  server's  native  file-system  pro- 
tection structure.  If  it  did,  the  entire 


archive  contents  would  be  available  to 
.  the  intruding  party. 
i; -;v ■  The  WAIS  protocol  should  be  noncor- 
■  ruptible  and  should  detect  privileged 
transactions  (i.e.,  those  data  streams 
that  possess  restricted  command  se- 
quences). However,  to  be  effective  as  a 
noncorruptible  application-layer  pro- 
tocol, the  underlying  computer  system 
must  also  be  unbreachable. 

Unfortunately,  you  cannot  always 
guarantee  protection.  In  1988,  a  virus 
introduced  through  a  known  port  as- 
saulted computer  systems  attached  to 
Internet.  Subsequent  sleuthing  discov- 
ered that  a  remote  system  could  activate 
the  debug  mode  of  the  Unix  mailer, 
forcing  the  Instigator  into  a  privileged 
state.  The  debug  mode  then  permitted 
the  virus  to  propagate  and  muhiply. 

Can  a  rogue  dynamic  folder,  fash- 
ioned after  the  Internet  virus,  intention- 
ally access  information  from  strategic 
servers  running  WAIS  software?  How 
will  WAISes  safeguard  information 
against  illegal  intrusion? 

The  right  to  privacy  is  inalienable, 
and  WAIS  technology  or  any  enabling 
system  that  promotes  information  com- 
merce must  preserve  it.  A  cautionary 
approach  toward  implementating  WAIS 
technology  is  necessary  and  appropri- 
ate. Several  legal  issues  must  be  ad- 
dressed to  secure  both  privacy  and  fair . 
business  practice.    .  M.  ■  -    :-  ; . 


4 


handle  a  querj'.  For  example,  a  question 
on  RISC  microprocessor  benchmarks 
would  list  directory  entries  for  servers  as 
well  as  pointers  to  articles  on  the  subject. 
When  you  retrieve  a  document,  the  di- 
rectory entry  is  also  provided.  Thus,  you 
obtain  ranking  information  for  questions 
of  similar  content. 

Each  server,  then,  contains  informa- 
tion of  value  to  certain  subscribers.  The 
dynamic  folder  can  continuously  poll 
newspaper  servers  for  new  articles  as 
they  arrive  from  the  news  wires,  while  it 
would  probably  query  a  dictionary  or  en- 
cyclopedia server  only  once,  since  the 
content  changes  much  less  frequently. 

Policing  the  large  number  of  anticipat- 
ed servers  (in  the  tens  of  thousands)  re- 
quires an  independent  quality-control 


mechanism.  An  audit  of  the  server  direc- 
tory would  reflect  any  server  that  fre- 
quently returns  erroneous  information  or 
does  not  perform.  An  independent  agen- 
cy like  Consumer  Reports,  the  Better 
Business  Bureau,  or  other  watchdog 
groups  could  create  rating  servers,  which 
monitor  and  rate  other  servers  in  the 
directory. 

These  rating  servers  resemble  movie 
and  TV  critics.  Consumers  acquire  con- 
fidence in  the  reports  and  reviews  that 
certain  critics  issue  because  they  share 
similar  tastes.  Just  as  moviegoers  start 
to  trust  a  particular  reviewer  who  has 
agreed  with  them  on  past  movies,  WAIS 
users  will  begin  to  trust  the  specific  rat- 
ing services  that  agree  with  them. 

A  subscriber  base  generates  income 


for  a  server.  The  rating  servers  will  at- 
tract subscribers  as  well,  for  they  direct 
trends  in  the  information  marketplace.  In 
fact,  they  may  become  the  first  "infor- 
mation speculators"  as  a  by-product  of 
WAIS  technology. 

Dynamic  Folders 

A  folder,  like  those  found  on  the  Macin- 
tosh, provides  the  WAIS  framework  for 
organizing  questions.  A  folder  is  a  re- 
pository for  documents.  A  file  system,  in 
the  Macintosh  sense,  is  full  of  folders 
organized  in  a  tree  structure  that  sup- 
ports an  efficient  document-location 
mechanism. 

To  find  a  document  within  a  file  sys- 
tem, you  typically  use  the  find  com- 
mand under  Unix  or  Finder  on  the  Mac. 


ISO     BYTE  •  MAY  1991 


Circlu  269  on  Inquiry  Card. 
OCR  SORWARE,  RELEASE  1.1 

RECosNi^APLUS 

SPEED,  ACCURACY  AND  FLEXIBILITY! 


BROWSING  THROUGH  TERABHES 


AUSTRALIA 

•  DalaAdfv 
W:  61.2«57-206« 

AUSTRIA 

•Artaj(ef 
W:  4J.222/58M5-0 

BELGIUM 

•Maxcom 

Tel:  32-2/528  94tt 
•THtech 

Tsl;  32-2M6S-7U5 

CZECHC^jOWWA 

•ISM^ency 
Tel:  42-2/a4097D 

DENMARK 

«TorMnft-dlp  data 
To):  45-««3-»5-99 

FINUND 

*CommNec 
W:  35S<W93100 

FRANCE 

•Apsylog 
TBI:  J3-1/40  26  22  32 

GERMANY 

•  Com  putsf  2000 
W:  4M8/7BO-«M) 
•Frank  Audiodala 

•Macrotron 
Tsl:  4M3/42-0M 

•Recognita 
BOrDautomatisleruno 
Tol;  37-41OT57-25(l 

GREECE 

•Eledel 
Tol;  30-1/36OT-521 

ICELAND 

•HolvKllausn 
W:  354-1/687033 


The  fastest  omnifont  OCR  Software 

operating  in  MS-DOS  and  Microsoft 

Windows  environment 

Dealers  are  welcome 

Call  (or  your  domo  diskette  today: 

(1-800-255-4-OCR),  P.O.  Box  0218    Us  Angeles,  CA 

9004a    Tel:  (408)  749-9935    Fax:(408)730-1180 

Distributors:         IRELAND 

•  SaunOera  As^uiaJtion 
Systems 
Tel:  353-t/366-522 

ITALY 

•\%conip 
Tol:  39-45/577500 

JAPAN 

•Suohiro  KoeW 
Kaisha,  Lid. 
Tol:  81-52/251-3721 

LUXEMBOURG 

•Bufovisioo 
Tol:  352-470951 
MEXICO 

•  Misamii 

Tel:  n-snm-onn 
NORWAY 

•  ICTDstatMJJn 
Tol:  <7-2/79-5«-80 
POLAND 

•FX  PrzoiJi.  Inl. 
■ftl:  48-12/58-57-7B 

SPAIN 

•  Com  putef  2000 
Espsna 
Tol:  34-3-473-1M0 

•CSEISA 

Tel;  34-3/336-33-«2 
•STI 

Tel:  34-1»4Sfle9-45 

SWEDEN 
Maogon  AB 

Tel:  46-6/732-87-37 

SWITZERUND 

■  ScanSet 

Tel;  4l-56/9&-<9«3 
TURKEY 
"EKSPA 
Tol;  90-4-13&«e-11 

UNITED 

KINGDOM 
Inlac  Dau  Systema 
Tel:  44-709/547-177       I 


•MSL  Dynamics 
(hx  Africa) 
W:  44-293/547-788 

YUGOSLAVIA 

•LTS 
W:  38-11/190-572 

OEM  Partners: 

•Axfet 
SWEDEN 

Tol:  4»-786raS5«) 

«Oout3ct>a  NIchlmen 
GERMANY 
Tel;  49-211/3551-202 

•EHQ 
GERMANY 

W:  49-7451/7051-2 

■Fi/tufa  TechnolOfly 
AUSTRIA 
IW:  43-222/666350 

•Qstronica 
HOLUND 

TH:  31-20-5681509 
•Hewlott-PackanJ 
AUSTRIA 
W:  43-222/2M<M) 

•  Mlciotok  Efectronics 
Europe 
GERMANY  - 

W:  49-211/52607-0 

•Microtek  IntemMlonal 
TAIWAN 
Tol:  686-35/772155 

•Mltaublilil  Electric 
Europe 
GERMANY 
Tel:  49-2102/466359 

•Panlax  Europe 
BELGIUM 
Tol:  32-2725  0570 

flicoti  Europe 
GERMANY 
Tel;  49-211/52654) 


Tnrhnnlnnii' 


Pn  rin  nil  f  nb-   Tit-irkr    C/^n    4Kn    llli-ilit-in 
bUlliputCI      IIIIII.S     -JCC     (lie    U/IIIIIILJ 


Computer  ma  leers  ire  scrjmbling  tocashinon  people  who 
find  tlie  pen  miglitier  than  the  keyboard. 
Internationa)  Business  Machines  Corp.,  Appie  Computer  inc. 

other  big  computer  waiters  are  staking  out  positions  in 
e  nascent  market  for  "note- pad  cemputers,"  small  machines 
let  users  enter  data  by  writing  rather  than  tapping 
s  The  note  pads  typically  recognize  numbers  and  letters 
on  a  screen  with  a  special  pen  and  convert  them  into 


Que$tion-1 


Lookfor  ^oouments  about 


ecent  developments  in  personal 
omputers 


^hioh  are  similar  <n  In  thesg  laurogs 

Technology :  Coi  <^ 


<i>  yjffSf.UMTu/ 


Results 


Q.  •»«  Compaq  Computer  Directors  Approve  2-for-1  Stock  Split^^ 
El  »•«  International :  Bull  Agrees  to  Pig  Zenith  $  1 5  Million  to  Em ' 
@  «««  AT&T  Set  to  Announce  Memorex  Computer  Accord 
13  •««  Technology  Brief — International  Business  Mach  nes  Pr 
El  »»«  Business  Brief  —  Data  Oen«ral  Corp.:  Four  Models  Are  Ur 
El  ««o  Technology  :  Computer  Firms  See  the  Writing  on  the  Scree 
13  «««  Retailing :  Businessland  Enters  Japan,  Aided  by  4  B  g  Loca 


Figure  2:  The  similar  to  function  lets  you  retrieve  more  documents  on  notepad 
computers  using  relevance  feedback.  You  then  might  initiate  a  search  for  additional 
documents  with  similar  content.  Selecting  text  from  a  section  of  a  retrieved  document 
helps  to  refine  subject-matter  searches  or  locate  collateral  information.  You  can  also 
use  the  selected  text  to  execute  a  new  query.  (Courtesy  of  Thinking  Machines  Corp.) 


With  one  of  these  tools,  you  can  locate 
the  position  of  a  file  and  gain  access  to  its 
contents.  Path-driven  locators  search  an 
information  base  for  a  document's  name, 
but  they  do  not  provide  a  means  to  exam- 
ine its  contents. 

Retrieving  documents  pertinent  to  a 
specific  question  requires  content  navi- 
gation (i.e.,  examining  the  contents  of  a 
document,  or  a  representative  abstract  or 
index  for  the  document,  for  its  relevance 
to  the  question).  The  similarity  between 
the  question  and  the  document's  index 
determines  a  retrieval  score,  an  indica- 
tion of  the  likelihood  that  the  document 
is  pertinent. 

WAISes  rely  on  the  dynamic  folder  to 
encapsulate  a  question.  In  its  most  pas- 
sive form,  it  contains  a  question  and  a  set 
of  servers  to  target.  The  WAIS  posts  the 
dynamic  folder  to  servers  of  known  qual- 
ity and  functionality,  and  then  query 
processing  begins. 

The  dynamic  folder  executes  a  remote 
query  that  sends  questions  to  the  remote 
servers.  There  the  questions  find  rele- 
vant information  and  return  a  list  of  doc- 
ument titles  (document  pointers)  encap- 
sulated within  the  originating  folder  to 
the  local  WAIS  system.  The  results  from 


the  query  may  initially  include  a  list  of 
documents  with  fair,  good,  or  high 
similarities. 

Now  you  can  refine  your  query  strate- 
gy by  perusing  the  document  titles  to  de- 
termine which  are  the  most  appropriate 
documents.  WAIS  technology,  in  the 
form  of  the  WAIStation  user  interface 
(see  reference  5),  assists  this  process 
through  a  content-associativity  function 
known  as  similar  to. 

The  similar  to  function  informs  the 
WAIS  user  interface  that  a  document  is 
"interesting."  The  server  uses  this  infor- 
mation to  find  other  documents  that  are 
similar  to  the  one  you  have  chosen.  This 
search  strategy,  an  embedded  compo- 
nent of  WAISes,  represents  a  significant 
improvement  over  traditional  database 
methods,  such  as  Structured  Query  Lan- 
guage (SQL)  and  Boolean  search. 

This  form  of  query  execution  is  known 
as  relevance  feedback.  It  lets  you  extend 
the  query  to  incorporate  a  "more-like- 
that-one"  functionality  and  lets  you  re- 
trieve documents  that  have  similar  con- 
tents. The  WAIS  user  interface  is 
organized  around  the  English  language, 
and  English-language-oriented  query 
structures  are  easier  to  use  than  SQL. 


Mkrosott  Windowa  end  MS-DOS  ore  trarfemarta  el  Mlcro<»>n  Coi^ 

iSa     BYTE'  MAY  1591 


BROWSING  THROUGH  TERABHES 


The  similar  to  function  is  like  work- 
ing witli  a  reference  librarian.  First,  you 
state  the  topic  of  your  research,  which  the 
librarian  translates  into  queries.  After 
you  examine  the  results  of  the  queries, 
you  indicate  which  results  were  on  the 
mark;  thus,  the  librarian  gains  a  better 
understanding  of  your  needs  and  can  im- 
prove the  search. 

With  relevance  feedback,  WAISes  can 
retrieve  documents  with  greater  ease  and 
speed.  You  no  longer  need  to  alter  a  SQL 
Boolean  q>erator  to  adjust  the  query  fil- 
ter; instead,  you  can  ask  for  "more  docu- 
ments like  this  one." 

Dynamic  folders  can  also  possess  vi- 
tality,  which  gives  the  folder  a  continu- 
ous charter  to  execute  queries  periodical- 
ly and  update  its  contents  with  new 
material.  A  folder's  charter  expresses 
purpose,  intent,  and  the  goal  that  you 
want  the  query  to  accomplish.  You  can 
build  the  folder  to  periodically  poll  serv- 
ers known  to  receive  frequently  updated 
material  that  matches  its  charter. 

If  the  search  retrieves  an  interesting 
document,  WAISes  let  you  select  a  por- 
tion of  the  text  and  use  it  as  an  adjunct  to 
the  initial  query.  Selecting  text  from  a 
portion  of  a  document  that  may  contain 
some  particularly  topical  or  relevant  in- 
formation and  using  it  to  refine  the 
search  is  an  innovative  approach  for  ex- 
ploring subjects  (see  figure  2). 

WAISes  also  let  you  chain  questions  by 
taking  the  results  of  a  previous  search, 
starting  a  new  question  with  different 
subject  matter,  and  dragging  the  previ- 
ous results  into  the  similar  to  menu  box 
(see  figure  3).  Chaining  questions  can 
either  broaden  or  narrow  a  search,  de- 
pending on  the  relevance-feedback  re- 
sults. 

The  recursive  capacity  of  dynamic 
folders  to  initiate  "sibling"  folders  dem- 
onstrates the  WAIS  potential  to  harness 
and  refine  subject  matter.  Query  refine- 
ment alters  the  charter  of  a  dynamic 
folder.  Sibling  dynamic  folders  execute 
directed  searches  and  can  have  an  auton- 
omous authority  to  broaden  the  range  of 
server  choices. 

Controlling  the  extent  of  search  expan- 
sion is  a  critical  issue.  For  individuals, 
cost  can  be  an  overwhelming  concern. 
WAIS  technology  does  not  yet  contain  an 
accounting  system  to  govern  search  crite- 
ria. Participating  information  services 
will  have  to  engineer  this  element  of  the 
technology  themselves. 

WAIS  Protocol 

WAISes  promote  connectivity  and  access 
to  remote  electronic-information  sources 
through  a  standard  protocol,  the  WAIS 


'  ith  relevance 
feedback;  WAISes 
can  retrieve 
documents  with 
greater  ease 
and  speed. 


protocol.  This  protocol  is  an  extension  of 
the  National  Information  Standards  Or- 
ganization (NISO)  Z39.50-1988  specifi- 
cation, which  defines  an  interface  to 
remote  information-retrieval  services 


and  library-protocol  applications.  The 
Z39.50  standard  is  the  backbone  of  the 
WAIS  protocol  and  the  foundation  for 
WAIS  applications  development. 

Incorporating  the  Z39.50  standard 
into  the  WAIS  protocol  frees  developers 
to  build  articulated  user  interfaces  for 
WAIS  applications.  The  interface  stan- 
dard isolates  the  server's  text-retrieval 
method,  such  as  SQL,  giving  the  applica- 
tion a  transparent  access  mode.  The  par- 
ticulars of  database  queries  are  hidden 
l^neath  the  interface.  A  developer  only 
needs  to  be  sure  that  the  server  possesses 
an  equivalent  functionality  to  conduct 
remote  information-retrieval  transac- 
tions from  a  local  WAIS  workstation. 

Concealing  the  server's  implementa- 
tion through  the  WAIS  protocol  is  impor- 
tant in  another  respect  as  well.  Isolating 
the  implementation  implies  that  you  can 
specify  a  single,  more  palatable  query 
language.  The  WAIS  protocol  also  lets 
you  use  an  English-language-style  query 


Figure  3:  Chaining  questions  permits  you  to  use  a  query  on  multiple  information 
sources  by  opening  a  new  question  and  dragging  previous  query  results  into  the 
similar  to  field.  You  can  also  apply  the  similar  to  operation  to  invoke  a  new 
document  search,  as  in  this  example.  (Courtesy  of  Thinking  Machines  Corp.) 


MAY  1991  •  BYTE     1S3 


1^ 


BROWSING  THROUGH  TERABHES 


lexicon  instead  of  cryptic  SQL  or  fourth- 
generation  languages.  When  you  find  a 
document  that  is  appropriate,  the  WAIS 
protocol  automatically  handles  the 
download  process  from  the  server.  This 
is  quite  different  from  existing  services, 
where  manual  file-capture  mechanisms 
require  vigilance.  With  the  WAIS  proto- 
col, all  documents  look  like  they  are 
local  to  your  system. 

The  WAIS  protocol  incorporates  two 
important  modifications  that  the  NISO 
Z39.50  standard  does  not  address.  First, 
it  permits  hypermedia  document  trans- 
port. Most  documents  today  are  com- 


'hilestilla 
research  project 
that  is  undergoing 
development 
and  refinement; 
the  WAIS  holds 
immense  promise. 


posed  primarily  of  ASCII  text  codes  and 
sequences,  but  the  next  generation  of 
documents,  constructed  from  hyperme- 
dia and  multimedia  sources,  integrates 
images  and  fully  formatted  text.  These 
media  forms  are  rapidly  becoming  popu- 
lar and  conventional. 

Second,  the  WAIS  protocol  is  stateless 
for  the  server.  It  does  not  have  to  keep 
any  information  about  the  client  between 
transactions,  because  the  user's  state  is 
kept  on  the  local  workstation.  Every 
search  or  retrieval  operation  is  a  separate 
process.  The  contexts  are  decoupled 
under  the  statelessness  of  the  protocol. 
This  decpupling  lets  you  make  a  search, 
store  away  the  document  pointer,  and  re- 
trieve it  later. 

Further,  you  can  use  a  dynamic  folder 
to  pass  one  of  these  document  pointers  to 
someone  else  who  can  also  retrieve  the 
document.  A  document  pointer  is  like  an 
International  Standard  Book  Number  for 
the  electronic  age.  (The  ISBN  is  a  unique 
identification  assigned  to  each  publica- 
tion.) Passing  a  document  pointer  con- 
forms with  copyright  law  and  lets  you 

164     B  Y  T  E  •  MAY  1991 


easily  return  to  the  document  source  in- 
stead of  making  copies. 

The  WAIS  protocol  is  designed  to 
transport  information  through  modems, 
X.25  communications,  or  network  back- 
bones. This  flexibility  provides  an  enor- 
mous framework  mihin  which  to  con- 
duct retrieval  transactions.  For  example, 
with  a  portable  computer,  you  could  con- 
nect with  a  WAIS  hub  through  a  modem 
and  post  dynamic  folders,  directing  the 
query  results  to  be  routed  to  your  office 
system  for  later  examination. 

Retrieval  Technology 

The  computing  infrastructure  needed  to 
implement  WAISes  varies  with  a  server's 
functionality.  A  Library  of  Congress 
WAIS,  with  25  terabytes  of  data,  could 
not  expeditiously  dispatch  queries  and 
function  if  a  serial  computer  were  used  to 
process  the  information.  For  a  problem 
of  this  magnitude,  massive  parallelism  is 
needed.  The  Connection  Machine's 
Text-Retrieval  System  is  a  viable  infor- 
mation-retrieval system  for  gigabyte-size 
databases. 

The  DowQuest  service  from  Dow 
Jones  runs  on  the  Connection  Machine. 
The  service  incorporates  approximately 
1  gigabyte  of  original  text  derived  from 
over  400  sources.  The.  Wall  Street  Jour- 
nal, the  Washington  Post,  Barron 's,  For- 
tune, Forbes,  and  several  regional  busi- 
ness and  technical  journals  are  includ- 
ed, covering  the  previous  eight  calendar 
months.  The  search  time  with  a  100- 
word  query  composed  of  typed  English 
and  relevance  feedback  (e.g.,  "more  like 
that  one")  is  less  than  half  a  second.  The 
system  can  provide  access  to  many  giga- 
bytes of  text  and  to  thousands  of  users 
interactively. 

The  projections  for  the  Connection 
Machine  system  indicate  that  when  it  is 
scaled  to  a  1-terabyte  database  with  10- 
word  queries,  obtaining  an  answer  with- 
in 10  seconds  or  less  is  highly  probable. 
This  performance  is  accomplished  by 
harnessing  the  Connection  Machine's 
65,536  separate  processors  to  execute  a 
parallel  index  algorithm  (see  reference 
6).  These  estimates  are  phenomenal  and 
truly  indicative  of  the  computing  power 
manifest  in  parallel  systems.  No  serial 
machine  can  even  come  close  to  this  level 
of  performance. 

The  Connection  Machine  system  gen- 
erates these  results  by  searching  the  en- 
tire contents  of  an  archive,  not  a  repre- 
sentative abstract  of  a  keyword  frequency 
table.  Each  document  within  the  archive 
is  used  to  determine  a  match.  This  is  not 
typical  for  systems  organized  around 
serial  computers,  and  it  is  another  dra- 


matic demonstration  of  parallel-comput- 
ing technology. 

The  cost  of  a  system  like  the  Connec- 
tion Machine  runs  in  the  millions  of  dol- 
lars. But  a  Macintosh  with  a  100-mega- 
byte  hard  disk  drive  or  a  386-based  PC 
can  serve  the  typical  WAIS  user. 

Immense  Promise 

The  prototype  WAIS  user  interface  and 
protocol  are  currently  being  beta-tested 
at  Thinking  Machines,  Apple  Computer, 
and  Dow  Jones  News/Retrieval.  Think- 
ing Machines,  the  principal  developer  of 
the  WAIS  architecture  and  software, 
plans  to  share  the  WAIS  protocol  free  of 
charge  and  hopes  to  help  user-interface 
developers  build  interfaces  to  WAIS 
servers. 

While  still  a  research  project  that  is 
undergoing  development  and  refinement, 
the  WAIS  holds  immense  promise.  Infor- 
mation commerce,  buoyed  through  the 
widespread  acceptance  of  computer  sys- 
tems and  networks,  forces  individuals 
and  companies  to  expedite  transactions 
and  simplify  activities.  These  coveted 
sources  of  efficiency  stand  out  as  promi- 
nent allies  of  competitive  advantage.  B 

ACKNOWLEOGI^ENT 
I'd  like  to  thank  Annie  Komanecky,  Frank- 
lin Davis,  Ben  Rewis,  and  Brewster  Kahle 
of  Thinking  Machines  for  their  assistance 
during  the  preparation  of  this  article. 

REFERENCES 

1.  Hillis,  D.  The  Conneaion  Machine. 
Boston,  MA:  MIT  Press,  1985. 

2.  Kahle,  B.  "Wide  Area  Information 
Server  Concepts."  Thinking  Machines 
Technical  Memo  DR89-1.  Cambridge, 
MA:  Thinking  Machines  Corp.,  1989. 

3.  Markoff,  J.  "Computer  Project  Would 
Speed  Data."  New  York  Times,  8  June 
1990,  sec.  A,  p.  1. 

4.  Markoff,  J.  "Creating  a  Giant  Computer 
Highway."  New  York  Times,  2  Sept.  1990, 
Part  III,  p.  11. 

5.  "WAIStation:  A  User  Interface  for  Wide 
Area  Information  Servers  (User  Guide, 
Prototype  Version)."  Cambridge,  MA: 
Thinking  Machines  Corp.,  1990. 

6.  Stanfill,  C,  R.  Thau,  and  D,  Waltz.  "A 
Parallel  Indexed  Algorithm  for  Informa- 
tion Retrieval."  Thinking  Machines  Corp. 
Technical  Report  DR  90-2.  Cambridge, 
MA:  Thinking  Machines  Corp.,  19W. 

Richard  Marlon  Stein  is  a  software  con- 
sultant and  freelance  writer  from  Van 
Nuys,  California.  He  has  a  B.S.  in  phys- 
ics from  the  University  of  California  at 
Irvine.  You  can  reach  him  on  BIX  do 
"editors. " 


